Written by Xiaodan Lou NYU Stern School of Business | May 2017
US housing market has recovered significantly since last financial ciris given historically low mortgage rate and sustained strength in economy and labor market. We are interested in the understanding the driving factors of US housing market.
Strong economy, low housing supply and low mortgage rate are accomodative for home price appreciation.
GDP
Inflation (CPI)
Labor Market (Nonfarm payrolls)
Housing Supply (New Home Sale)
Mortgage Rate
All of data used in this project is from St. Louis Fed. I used Python Fred API to directly pull data.
Home Price Appreciation is monitored by S&P/Case-Shiller U.S. National Home Price Index which represent the home price level of the major US 20 cities. Source URLs are listed below. https://fred.stlouisfed.org/series/CSUSHPINSA
MtgRate: 30 Year Conventional Mortgage Rate https://fred.stlouisfed.org/series/MORTGAGE30US
CPI: Consumer Price Index for All Urban Consumers: https://fred.stlouisfed.org/series/CPIAUCSL
GDP: GDP (Seasonally Adjusted Annual Rate) https://fred.stlouisfed.org/series/GDP
NFP: Nonfarm payrolls(All Employees: Total Nonfarm Payrolls) https://fred.stlouisfed.org/series/PAYEMS
HouseSupply: Monthly Supply of Houses in the United States https://fred.stlouisfed.org/series/MSACSR
RentalVacancy: Rental Vacancy Rate for the United States https://fred.stlouisfed.org/search?st=rent
MedianSalePrice: Median Sales Price for New Houses Sold in the United States https://fred.stlouisfed.org/series/MSPNHSUS
HouseSold: New One Family Houses Sold: United States (HSN1F) https://fred.stlouisfed.org/series/HSN1F
To understand the impact of factors on house price appreciation, we ran regression and performed out of sample forecasting with model trained.
Regression Model
$$HomePrice = \beta_0*HouseSold+\beta_1*MedianSalePrice+\beta_2*RentalVacancy+\beta_3*HouseSupply+\beta_4*NFP+\beta_5*CPI+\beta_6*GDP+\beta_7*MtgRate$$For the modeling part, we did both OLS and Elastic Net. (Elastic Net to be illustrated later)
we can do OLS (Ordinary Least Square Regression) to model variables impact on House Price Appreciation (Percentage change of House Price Index). To account for overfitting issue which typically seen in OLS, we can use regression with regularization such as ridge/lasso regression or the combination called Elastic Net.
In [2]:
import pandas as pd
import numpy as np
import os
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sb
from fredapi import Fred
fred = Fred(api_key='b28811a83a7353bb174b68c3c2214f82')
[start_date,end_date] = ['1990-01-01','2017-02-01']
In [269]:
HPI = pd.read_excel('data/HPI_PO_monthly_hist.xls',header=3,index_col=0).dropna()
HPI.columns = [i.replace('\n','') for i in HPI.columns]
HPI.plot(figsize=(20,8),fontsize=20)
Out[269]:
In [47]:
HPI_20city = fred.get_series('CSUSHPINSA',observation_start=start_date, observation_end=end_date)
HPI_20city.plot()
Out[47]:
In [282]:
GDP = fred.get_series('GDP',observation_start=start_date, observation_end=end_date)
GDP.pct_change().plot(kind = 'bar',x=None,figsize = (20,10))
plt.title('US Qtr GDP Percentage Change')
Out[282]:
In [288]:
CPI = fred.get_series('CPIAUCSL',observation_start=start_date, observation_end=end_date)
CPI.pct_change().plot()
plt.title('US CPI for All Items')
Out[288]:
In [301]:
MtgRate_Weekly = fred.get_series('MORTGAGE30US',observation_start='1989-12-01', observation_end=end_date)
MtgRate1 = MtgRate_Weekly.resample('M').mean()# reduce to weekly to monthly by taking average
#replace month end to month start
MtgRate = pd.Series(data=MtgRate1.values,index=CPI.index)
MtgRate.plot()
plt.title('US 30Y Fixe Rate Mortgage \n Mortgage Financing Cost is at Historically Low Level')
Out[301]:
In [300]:
NFP = fred.get_series('PAYEMS',observation_start=start_date, observation_end=end_date)
NFP.pct_change().plot()
plt.title('US NonFarm Payroll \n Labor Market Steadly Recovered')
Out[300]:
In [299]:
HouseSupply = fred.get_series('MSACSRNSA',observation_start=start_date, observation_end=end_date)
HouseSupply.plot()
plt.title('US House Supply Level is in range bound \n No significant inventory issue')
Out[299]:
In [298]:
RentalVacancy = fred.get_series('RRVRUSQ156N',observation_start=start_date, observation_end=end_date)
RentalVacancy.plot()
plt.title('US Rental Market is Strong \n Given Declining House Ownership')
Out[298]:
In [297]:
MedianSalePrice = fred.get_series('MSPNHSUS',observation_start=start_date, observation_end=end_date)
(MedianSalePrice/CPI).plot()
plt.title('Ratio of Median Sales/CPI \n After inflation adjust, sales price is exceeding GFC high')
Out[297]:
In [57]:
HouseSold = fred.get_series('HSN1F',observation_start=start_date, observation_end=end_date)
HouseSold.plot()
Out[57]:
After visualization of factors, we do data tranformation for the modeling part.
Cubic interpolation is common in economics to deal with data with different frequency but induce serial correlation in data. RentalVacancy and GDP are quarterly data.
This is a standardized to way to model impact of variable change to house price change. In here, we will use percentage change of HouseSold,MedianSalePrice,RentalVacancy,HouseSupply,NFP,CPI,GDP. Mortgage rate level will matter as its level is crucial for finance a house. Mortgage rate change will drive refinance and MBS prepayment speed which is not our focus in this project.
In [162]:
data_raw = [HPI_20city,HouseSold,MedianSalePrice,RentalVacancy,HouseSupply,NFP,CPI,GDP,MtgRate]
data0 = pd.concat(data_raw,axis=1,keys=['HPI','HouseSold','MedianSalePrice','RentalVacancy','HouseSupply','NFP','CPI','GDP','MtgRate'])
data0.head(20)
Out[162]:
In [171]:
data1 = data0.interpolate(method='linear').pct_change()
data1['MtgRate'] = MtgRate.values/100
data=data1.dropna()
data.head(12)
Out[171]:
In [172]:
data.describe()
Out[172]:
In [165]:
pd.scatter_matrix(data, alpha = 0.3, figsize = (14,14), diagonal = 'kde');
Typically, we can do OLS (Ordinary Least Square Regression) to model variables impact on House Price Appreciation (Percentage change of House Price Index). To account for overfitting issue which typically seen in OLS, we can use regression with regularization such as ridge ro lasso regression. However we use elastic net regression here. As for Lasso vs ElasticNet, ElasticNet will tend to select more variables hence lead to larger models but also be more accurate in general. In particular Lasso is very sensitive to correlation between features and might select randomly one out of 2 very correlated informative features while ElasticNet will be more likely to select both which should lead to a more stable model (in terms of generalization ability so new samples).
Quoted from SKlearn website--
In [224]:
# Split data in train set and test set
data_mat = np.matrix(data)
X = data.iloc[:,1:]
y = data.iloc[:,0]
n_samples = X.shape[0]
pct_train = 0.6
train_size = int(n_samples*pct_train)
test_size = n_samples-train_size
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[test_size:], y[test_size:]
In [226]:
import statsmodels.api as sm
from IPython.display import HTML, display
#Fitting by Ordinary Least Square
model = sm.OLS(y_train,X_train)
ols_model = model.fit()
Housing_OLS = ols_model.summary()
HTML(
Housing_OLS\
.as_html()\
.replace(' Adj. R-squared: ', ' Adj. R-squared: ')\
.replace('coef', 'coef')\
.replace('std err', 'std err')\
.replace('P>|t|', 'P>|t|')\
.replace('[95.0% Conf. Int.]', '[95.0% Conf. Int.]')
)
Out[226]:
In [267]:
from sklearn.linear_model import ElasticNetCV
yp_ols = ols_model.predict(X_test)
ElasticNetModel = ElasticNetCV(l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=1, positive=False, random_state=None, selection='cyclic')
YD_actual = pd.Series(y_test)
YD_OLS = pd.Series(yp_ols)
YD_OLS.index = YD_actual.index
y_pred_ENCV = ElasticNetModel.fit(X_train, y_train).predict(X_test)
YD_ElasticNet = pd.Series(y_pred_ENCV)
YD_ElasticNet.index = YD_actual.index
pd.concat([YD_actual,YD_OLS], keys=['Actual','OLS'],axis=1).plot(legend=True)
plt.title('HPI Predicted vs Actual by Regression')
Out[267]:
In [268]:
pd.concat([YD_actual,YD_ElasticNet], keys=['Actual','ElasticNet'],axis=1).plot(legend=True)
plt.title('HPI Predicted vs Actual by ElasticNet')
Out[268]:
This data analysis explores the reasons and main drivers for US housing market. We try to find th relationship bewteen the the price of the housing appreciation and several factors such as general market conditions, mortgage rate and housing supply and demand. The continuation in growth has been fostered by several supporting factors: net demand running above the norm, a relentless decline in the market share of distressed home sales (now probing record lows), and easing bank lending standards.We note that the general economy condition is still the key drivers for the housing market, following by the vacancy rate ofthe exisiting home, in US based on the past 27 years of data. Higher mortgage rate can affect the demand but not excessively.
The Census Bureau collects new home sales based upon the following definition: "A sale of the new house occurs with the signing of a sales contract or the acceptance of a deposit." The house can be in any stage of construction: not yet started, under construction, or already completed. Typically about 25% of the houses are sold at the time of completion. The remaining 75% are evenly split between those not yet started and those under construction.
Existing home sales data are provided by the National Association of Realtors®. According to them, "the majority of transactions are reported when the sales contract is closed." Most transactions usually involve a mortgage which takes 30-60 days to close. Therefore an existing home sale (closing) most likely involves a sales contract that was signed a month or two prior.
Given the difference in definition, new home sales usually lead existing home sales regarding changes in the residential sales market by a month or two. For example, an existing home sale in January, was probably signed 30 to 45 days earlier which would have been in November or December. This is based on the usual time it takes to obtain and close a mortgage.
https://www.census.gov/construction/nrs/new_vs_existing.html